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Abstract 

In risk management it is desirable to grasp the essential statistical features of a time series rep- 
resenting a risk factor. This tutorial aims to introduce a number of different stochastic processes 
that can help in grasping the essential features of risk factors describing different asset classes or 
behaviors. This paper does not aim at being exhaustive, but gives examples and a feeling for practi- 
cally implementable models allowing for stylised features in the data. The reader may also use these 
models as building blocks to build more complex models, although for a number of risk management 
applications the models developed here suffice for the first step in the quantitative analysis. The 
broad qualitative features addressed here are fat tails and mean reversion. We give some orientation 
on the initial choice of a suitable stochastic process and then explain how the process parameters 
can be estimated based on historical data. Once the process has been calibrated, typically through 
maximum likelihood estimation, one may simulate the risk factor and build future scenarios for the 
risky portfolio. On the terminal simulated distribution of the portfolio one may then single out 
several risk measures, although here we focus on the stochastic processes estimation preceding the 
simulation of the risk factors Finally, this first survey report focuses on single time series. Correlation 
or more generally dependence across risk factors, leading to multivariate processes modeling, will be 
addressed in future work. 
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1 Introduction 

In risk management and in the rating practice it is desirable to grasp the essential statistical features 
of a time series representing a risk factor to begin a detailed technical analysis of the product or the 
entity under scrutiny. The quantitative analysis is not the final tool, since it has to be integrated with 
judgemental analysis and possible stresses, but represents a good starting point for the process leading 
to the final decision. 

This tutorial aims to introduce a number of different stochastic processes that, according to the 
economic and financial nature of the specific risk factor, can help in grasping the essential features of 
the risk factor itself. For example, a family of stochastic processes that can be suited to capture foreign 
exchange behaviour might not be suited to model hedge funds or a commodity. Hence there is need 
to have at one's disposal a sufficiently large set of practically implementable stochastic processes that 
may be used to address the quantitative analysis at hand to begin a risk management or rating decision 
process. 

This report does not aim at being exhaustive, since this would be a formidable task beyond the scope 
of a survey paper. Instead, this survey gives examples and a feeling for practically implementable models 
allowing for stylised features in the data. The reader may also use these models as building blocks to build 
more complex models, although for a number of risk management applications the models developed here 
suffice for the first step in the quantitative analysis. 

The broad qualitative features addressed here arc fat tails and mean reversion. This report begins 
with the geometric Brownian motion (GBM) as a fundamental example of an important stochastic process 
featuring neither mean reversion nor fat tails, and then it goes through generalisations featuring any of the 
two properties or both of them. According to the specific situation, the different processes are formulated 
either in (log-) returns space or in levels space, as is more convenient. Levels and returns can be easily 
converted into each other, so this is indeed a matter of mere convenience. 

This tutorial will address the following processes: 

• Basic process: Arithmetic Brownian Motion (ABM, returns) or GBM (levels). 

• Fat tails processes: GBM with lognormal jumps (levels), ABM with normal jumps (returns), 
GARCH (returns), Variance Gamma (returns). 

• Mean Reverting processes: Vasicek, CIR (levels if interest rates or spreads, or returns in 
general), exponential Vasicek (levels). 

• Mean Reverting processes with Fat Tails: Vasicek with jumps (levels if interest rates or 
spreads, or returns in general), exponential Vasicek with jumps (levels). 

Different financial time series are better described by different processes above. In general, when first 
presented with a historical time series of data for a risk factor, one should decide what kind of general 
properties the series features. The financial nature of the time series can give some hints at whether 
mean reversion and fat tails are expected to be present or not. Interest rate processes, for example, are 
often taken to be mean reverting, whereas foreign exchange series are supposed to be often fat tailed and 
credit spreads feature both characteristics. In general one should: 

Check for mean reversion or stationarity 

Some of the tests mentioned in the report concern mean reversion. The presence of an autoregressive 
(AR) feature can be tested in the data, typically on the returns of a series or on the series itself if this 
is an interest rate or spread series. In linear processes with normal shocks, this amounts to checking 
stationarity. If this is present, this can be a symptom for mean reversion and a mean reverting process 
can be adopted as a first assumption for the data. If the AR test is rejected this means that there is 
no mean reversion, at least under normal shocks and linear models, although the situation can be more 
complex for nonlinear processes with possibly fat tails, where the more general notions of stationarity 
and ergodicity may apply. These are not addressed in this report. 

If the tests find AR features then the process is considered as mean reverting and one can compute 
autocorrelation and partial autocorrelation functions to see the lag in the regression. Or one can go 
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directly to the continuous time model and estimate it on the data through maximum likelihood. In this 
case, the main model to try is the Vasicek model. 

If the tests do not find AR features then the simplest form of mean reversion for linear processes 
with Gaussian shocks is to be rejected. One has to be careful though that the process could still be 
mean reverting in a more general sense. In a way, there could be some sort of mean reversion even under 
non-Gaussian shocks, and example of such a case are the jump-extended Vasicek or exponential Vasicek 
models, where mean reversion is mixed with fat tails, as shown below in the next point. 
Check for fat tails If the AR test fails, either the series is not mean reverting, or it is but with fatter 
tails that the Gaussian distribution and possibly nonlinear behaviour. To test for fat tails a first graphical 
tool is the QQ-plot, comparing the tails in the data with the Gaussian distribution. The QQ-plot gives 
immediate graphical evidence on possible presence of fat tails. Further evidence can be collected by 
considering the third and fourth sample moments, or better the skewness and excess kurtosis, to see 
how the data differ from the Gaussian distribution. More rigorous tests on normality should also be 
run following this preliminary analysis 1 . If fat tails are not found and the AR test failed, this means 
that one is likely dealing with a process lacking mean reversion but with Gaussian shocks, that could be 
modeled as an arithmetic Brownian motion. If fat tails are found (and the AR test has been rejected), 
one may start calibration with models featuring fat tails and no mean reversion (GARCH, NGARCH, 
Variance Gamma, arithmetic Brownian motion with jumps) or fat tails and mean reversion (Vasicek 
with jumps, exponential Vasicek with jumps). Comparing the goodness of fit of the models may suggest 
which alternative is preferable. Goodness of fit is determined again graphically through QQ-plot of the 
model implied distribution against the empirical distribution, although more rigorous approaches can be 
applied 2 . The predictive power can also be tested, following the calibration and the goodness of fit tests. 

3 

The above classification may be summarized in the table, where typically the referenced variable is 
the return process or the process itself in case of interest rates or spreads: 





Normal tails 


Fat tails 


NO mean reversion 


ABM 


ABM+ Jumps, 
(N)GARCH, VG 


Mean Reversion 


Vasicek 


Exponential Vasicek 
CIR, Vasicek with Jumps 



Once the process has been chosen and calibrated to historical data, typically through regression 
or maximum likelihood estimation, one may use the process to simulate the risk factor over a given 
time horizon and build future scenarios for the portfolio under examination. On the terminal simulated 
distribution of the portfolio one may then single out several risk measures. This report does not focus 
on the risk measures themselves but on the stochastic processes estimation preceding the simulation of 
the risk factors. In case more than one model is suited to the task, one can analyze risks with different 
models and compare the outputs as a way to reduce model risk. 

Finally, this first survey report focuses on single time series. Correlation or more generally dependence 
across risk factors, leading to multivariate processes modeling, will be addressed in future work. 

Prerequisites 

The following discussion assumes that the reader is familiar with basic probability theory, includ- 
ing probability distributions (density and cumulative density functions, moment generating functions), 
random variables and expectations. It is also assumed that the reader has some practical knowledge of 
stochastic calculus and of Ito's formula, and basic coding skills. For each section, the MATLAB® code 
is provided to implement the specific models. Examples illustrating the possible use of the models on 
actual financial time series are also presented. 

^uch as, for example, the Jarque Bera, the Shapiro- Wilk and the Anderson-Darling tests, that are not addressed in this 
report 

2 Examples are the Kolmogorov Smirnov test, likelihood ratio methods and the Akaike information criteria, as well as 
methods based on the Kullback Leibler information or relative entropy, the Hellingcr Distance and other divergences 

3 The Diebold Mariano statistics can be mentioned as an example for AR processes. These approaches arc not pursued 
here. 
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2 Modelling with Basic Stochastic Processes: GBM 

This section provides an introduction to modelling through stochastic processes. All the concepts will be 
introduced using the fundamental process for financial modelling, the geometric Brownian motion with 
constant drift and constant volatility. The GBM is ubiquitous in finance, being the process underlying 
the Black and Scholes formula for pricing European options. 



2.1 The Geometric Brownian Motion 

The geometric Brownian motion (GBM) describes the random behaviour of the asset price level S(t) over 
time. The GBM is specified as follows: 

dS(t) = rtS(t)dt + <rS(t)dW(t) (1) 

Here W is a standard Brownian motion, a special diffusion process 4 that is characterised by indepen- 
dent identically distributed (iid) increments that are normally (or Gaussian) distributed with zero mean 
and a standard deviation equal to the square root of the time step. Independence in the increments 
implies that the model is a Markov Process, which is a particular type of process for which only the 
current asset price is relevant for computing probabilities of events involving future asset prices. In other 
terms, to compute probabilities involving future asset prices, knowledge of the whole past does not add 
anything to knowledge of the present. 

The d terms indicate that the equation is given in its continuous time version 5 . The property of 
independent identically distributed increments is very important and will be exploited in the calibration 
as well as in the simulation of the random behaviour of asset prices. Using some basic stochastic calculus 
the equation can be rewritten as follows: 

d log S(t) = (^i - ^C7 2 ^ dt + crdW{t) (2) 

where log denotes the standard natural logarithm. The process followed by the log is called an 
Arithmetic Brownian Motion. The increments in the logarithm of the asset value are normally distributed. 
This equation is straightforward to integrate between any two instants, t and u, leading to: 

logS(u)-logS(t)= (/^-^ 2 ) (u-t) + a(W(u)-W(t))~N^-^ (u - t),a 2 (u - t)) . (3) 

Through exponentiation and taking u = T and t = the solution for the GBM equation is obtained: 

1 



S(T) = 5(0) exp 



T + aW{T) (4) 



This equation shows that the asset price in the GBM follows a log-normal distribution, while the 
logarithmic returns \og(S t +At/ S t ) are normally distributed. 

The moments of S(T) are easily found using the above solution, the fact that W(T) is Gaussian with 
mean zero and variance T, and finally the moment generating function of a standard normal random 
variable Z given by: 

E[e aZ ]=ei a2 . (5) 
Hence the first two moments (mean and variance) of S(T) are: 



Also referred to as a Wiener Process. 
5 A continuous-time stochastic process is one where the value of the price can change at any point in time. The theory is 
very complex and actually this differential notation is just short-hand for an integral equation. A discrete-time stochastic 
process on the other hand is one where the price of the financial asset can only change at certain fixed times. In practice, 
discrete behavior is usually observed, but continuous time processes prove to be useful to analyse the properties of the 
model, besides being paramount in valuation where the assumption of continuous trading is at the basis of the Black and 
Scholes theory and its extensions. 
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E [S(T)] = S(0)e" T Var [S(T)} = e 2 " T S 2 (0) [e^ T - l) (6) 

To simulate this process, the continuous equation between discrete instants t$ <t\ < ... <t n needs to 
be solved as follows: 



S(t i+1 ) = S(U)exp 



1 s 

a a 

M 2 



{ti+l —ti) + <7 y/ ti + i — UZi+i 



(7) 



where Z 1; Z 2 ,...Z„ are independent random draws from the standard normal distribution. The 
simulation is exact and does not introduce any discretisation error, due to the fact that the equation 
can be solved exactly. The following charts plot a number of simulated sample paths using the above 
equation and the mean plus/minus one standard deviation and the hrst and 99th percentiles over time. 
The mean of the asset price grows exponentially with time. 



100 200 300 400 500 600 

Time in days 




Figure 1: GBM Sample Paths and Distribution Statistics 



The following Matlab function simulates sample paths of the GBM using equation (7), which was 
vectorised in Matlab. 



Code 1 MATLAB® Code to simulate GBM Sample Paths. 



function S=GBM_simulation(N_Sim ) T ) dt,mu,sigma ) SO) 



mean=(mu-0.5*sigma~2)*dt; 
S= SO* ones (N_Sim ,T+1) ; 



BM= sigma* sqrt (dt ) *normrnd (0,l,M_Sim,T); 
S( : ,2: end)=S0*exp(cumsum(mean+BM ,2)) ; 



2.2 Parameter Estimation 

This section describes how the GBM can be used as an attempt to model the random behaviour of the 
FTSE100 stock index, the log of which is plotted in the left chart of Figure 2. 

The second chart, on the right of Figure 2, plots the quantiles of the log return distribution against 
the quantiles of the standard normal distribution. This QQ-plot allows one to compare distributions 
and to check the assumption of normality. The quantiles of the historical distribution are plotted on the 
Y-axis and the quantiles of the chosen modeling distribution on the X-axis. If the comparison distribution 
provides a good fit to the historical returns, then the QQ-plot approximates a straight line. In the case 
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Figure 2: Historical FTSE100 Index and QQ Plot FTSE 100 



of the FTSE100 log returns, the QQ-plot show that the historical quantiles in the tail of the distribution 
are significantly larger compared to the normal distribution. This is in line with the general observation 
about the presence of fat tails in the return distribution of financial asset prices. Therefore the GBM 
at best provides only a rough approximation for the FTSE100. The next sections will outline some 
extensions of the GBM that provide a better fit to the historical distribution. 

Another important property of the GBM is the independence of returns. Therefore, to ensure the 
GBM is appropriate for modelling the asset price in a financial time series one has to check that the 
returns of the observed data are truly independent. The assumption of independence in statistical terms 
means that there is no autocorrelation present in historical returns. A common test for autocorrelation 
(typically on returns of a given asset) in a time series sample x%, x%, ■ ■ ■ , x n realised from a random process 
X{ti) is to plot the autocorrelation function of the lag k, defined as: 

n — k 

ACF(k) = -. \2{xi-rh){x i+k -m), k = l,2,... (8) 

(n — k)v ~* 

where rh and v are the sample mean and variance of the series, respectively. Often, besides the ACF, 
one also considers the Partial Auto-Correlation function (PACF). Without fully defining PACF, roughly 
speaking one may say that while ACF(fc) gives an estimate of the correlation between X(ti) and X(ti + k), 
PACF(fc) informs on the correlation between X(ti) and X(ti + k) that is not accounted for by the shorter 
lags from 1 to k — 1 , or more precisely it is a sort of sample estimate of: 

Corr(A(<0 - X ^- l+k -\ X{t l+k ) - X t i +l"" ,i+k ~ 1 ) 

where x^ +1 '"'' i+ _1 and _1 are the best estimates (regressions in a linear context) of X{t{) 

and X(ti+k) given X(ti + i), . . . , X(ti + k-i). PACF also gives an estimate of the lag in an autoregressive 
process. 

ACF and PACF for the FTSE 100 equity index returns are shown in the charts in Figure 3. For the 
FTSE100 one does not observe any significant lags in the historical return time series, which means the 
independence assumption is acceptable in this example. 

Maximum Likelihood Estimation 

Having tested the properties of independence as well as normality for the underlying historical data 
one can now proceed to calibrate the parameter set O = (/i, a) for the GBM based on the historical 
returns. To find O that yields the best fit to the historical dataset the method of maximum likelihood 
estimation is used (MLE). 

MLE can be used for both continuous and discrete random variables. The basic concept of MLE, as 
suggested by the name, is to find the parameter estimates for the assumed probability density function 
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Figure 3: Autocorrelation and Partial Autocorrelation Function for FTSE 100 



/e (continuous case) or probability mass function (discrete case) that will maximise the likelihood or 
probability of having observed the given data sample x%, X2, X3, x n for the random vector X\, X n . 
In other terms, the observed sample xi, X2, X3, x« is used inside fxi,x 2 ,...,X n ;@t so that the only variable 
in / is 0, and the resulting function is maximised in 0. The likelihood (or probability) of observing a 
particular data sample, i.e. the likelihood function, will be denoted by £(0). 

In general, for stochastic processes, the Markov property is sufficient to be able to write the likelihood 
along a time series of observations as a product of transition likelihoods on single time steps between 
two adjacent instants. The key idea there is to notice that for a Markov process x t , one can write, again 
denoting / the probability density of the vector random sample: 

M®) = fx(to),x(ti),...,x(t n y,e = fx(t„)\x{t n -i);& ■ fx(t n - l )\x{t n - 2 y,e • ' ' /x(ii)|x(i );e ■ fx(t )-,e (9) 

In the more extreme cases where the observations in the data series are iid, the problem is consid- 
erably simplified, since fx(ti)\X(U-i);& = /x(t»);e an d t ne likelihood function becomes the product of 
the probability density of each data point in the sample without any conditioning. In the GBM case, 
this happens if the maximum likelihood estimation is done on log-returns rather than on the levels. By 
defining the X as 

X(ti) := logSfe) - logS(*,_i), (10) 

one sees that they are already independent so that one does not need to use explicitly the above decom- 
position (9) through transitions. For general Markov processes different from GBM, or if one had worked 
at level space S(U) rather than in return space X(ti), this can be necessary though: see, for example, 
the Vasicek example later on. 

The likelihood function for iid variables is in general: 

n 

£(0) = fe (Xi,X2,X3, ...,X n ) = Y[f® ( X i) ( n ) 

1=1 

The MLE estimate is found by maximising the likelihood function. Since the product of density 
values could become very small, which would cause numerical problems with handling such numbers, 
the likelihood function is usually converted 6 to the log likelihood C* = log£. 7 For the iid case the 
log-likelihood reads: 

n 

£*(©) = log /e(*i) (12) 

i=l 

6 This is allowed, since maxima are not affected by monotone transformations. 

7 The reader has to be careful not to confuse the log taken to move from levels 5 to returns X with the log used to move 
from the likelihood to the log-likelihood. The two are not directly related, in that one takes the log-likelihood in general 
even in situations when there are no log-returns but just general processes. 
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The maximum of the log likelihood usually has to be found numerically using an optimisation algo- 
rithm 8 . In the case of the GBM, the log-returns increments form normal iid random variables, each with 
a well known density f@(x) = f(x; m, v), determined by mean and variance. Based on Equation (3) (with 
u = tj+i and t — U) the parameters are computed as follows: 

At v = o^At (13) 

The estimates for the GBM parameters arc deduced from the estimates of m and v. In the case of 
the GBM, the MLE method provides closed form expressions for m and v through MLE of Gaussian iid 
samples 9 . This is done by differentiating the Gaussian density function with respect to each parameter and 
setting the resulting derivative equal to zero. This leads to following well known closed form expressions 
for the sample mean and variance 10 of the log returns samples Xi for Xi = logS(tj) — logS(tj_i) 

n n 

rh = ^""^ Xj/n v = — rh) 2 jn. (14) 

i=l i=l 

Similarly one can find the maximum likelihood estimates for both parameters simultaneously using 
the following numerical search algorithm. 



Code 2 MAT LAB® MLE Function for iid Normal Distribution. 



function [mu sigma] = GBM_ cal ibr at ion CS , dt , params ) 
Ret=price2ret (S) ; 
n=length (Ret ) ; 

options = optimset ( ' MaxFunEvals ' , 100000, 'Maxlter', 100000); 
f mins ear ch C OnormalLL , params, options); 
function mil = normalLL ( params ) 

mu= params (1) ; sigma=abs (par am s (2) ) ; 

ll=n*log (1/sqrt (2*pi*dt ) / s igma ) + sum ( - ( Ret -mu*dt) . "2/2/ ( dt * sigma "2) ) ; 
mll=-ll; 

end 

end 



An important remark is in order for this important example: given that Xi = log s(tj) — logs(tj-i), 
where s is the observed sample time series for geometric brownian motion S, rh is expressed as: 

- / igg(gcy) - iog(s(M) nt .s 

m = y Xi n — (15) 

i n 

i=l 

The only relevant information used on the whole S sample are its initial and final points. All of the 
remaining sample, possibly consisting of hundreds of observations, is useless. 11 

This is linked to the fact that drift estimation for random processes like GBM is extremely difficult. 
In a way, if one could really estimate the drift one would know locally the direction of the market in 
the future, which is effectively one of the most difficult problems. The impossibility of estimating this 
quantity with precision is what keeps the market alive. This is also one of the reasons for the success of 
risk neutral valuation: in risk neutral pricing one is allowed to replace this very difficult drift with the 
risk free rate. 

8 For example, Excel Solver or fminsearch in MatLab, although for potentially multimodal likelihoods, such as possibly 
the jump diffusion cases further on, one may have to preprocess the optimisation through a coarse grid global method such 
as genetic algorithms or simulated annealing, so as to avoid getting stuck in a local minima given by a lower mode. 

9 Hogg and Craig, Introduction to Mathematical Statistics., fourth edition, p. 205, example 4. 

10 Note that the estimator for the variance is biased, since E(S) = v(n — l)/n. The bias can be corrected by multiplying 
the estimator by n/(n — 1). However, for large data samples the difference is very small. 

11 Jacod, in Statistics of Diffusion Processes: Some Elements, CNR-IAMI 94.17, p. 2, in case v is known, shows that the 
drift estimation can be statistically consistent, converging in probability to the true drift value, only if the number of points 
times the time step tends to infinity. This confirms that increasing the observations between two given fixed instants is 
useless; the only helpful limit would be letting the final time of the sample go to infinity, or let the number of observations 
increase while keeping a fixed time step 
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The drift estimation will improve considerably for mean reverting processes, as shown in the next 
sections. 

Confidence Levels for Parameter Estimates 

For most applications it is important to know the uncertainty around the parameter estimates obtained 
from historical data, which is statistically expressed by confidence levels. Generally, the shorter the 
historical data sample, the larger the uncertainty as expressed by wider confidence levels. 

For the GBM one can derive closed form expressions for the Confidence Interval (CI) of the mean 
and standard deviation of returns, using the Gaussian law of independent return increments and the fact 
that the sum of independent Gaussians is still Gaussian, with the mean given by the sum of the means 
and the variance given by the sum of variances. Let C n = X\ + X2 + X3 + ... + X n , then one knows that 
C n ~ N(nm, nv) 12 . One then has: 

„ / Cn — nm \ ( C„ In — m \ , . , . 



where $() denotes the cumulative standard normal distribution. C n /n, however, is the MLE estimate 
for the sample mean m. Therefore the 95th confidence level for the sample mean is given by: 

— — 1.96-^E < m < — + 1.96-^E (17) 
n \Jn n \Jn 

This shows that as n increases the CI tightens, which means the uncertainty around the parameter 
estimates declines. 

Since a standard Gaussian squared is a chi-squared, and the sum of n independent chi-squared is a 
chi-squared with n degrees of freedom, this gives: 

"(eC^)' <*)-'(=• <•)-*<*>• < 18 > 

hence under the assumption that the returns are normal one finds the following confidence interval 
for the sample variance: 

~x v < v <~x v ( 19 ) 

where q* , q]j are the quantiles of the chi-squared distribution with n degrees of freedom corresponding 
to the desired confidence level. 

In the absence of these explicit relationships one can also derive the distribution of each of the pa- 
rameters numerically, by simulating the sample parameters as follows. After the parameters have been 
estimated (either through MLE or with the best fit to the chosen model, call these the "first parame- 
ters"), simulate a sample path of length equal to the historical sample with those parameters. For this 
sample path one then estimates the best fit parameters as if the historical sample were the simulated 
one, and obtains a sample of the parameters. Then again, simulating from the first parameters, one 
goes through this procedure and obtains another sample of the parameters. By iterating one builds a 
random distribution of each parameter, all starting from the first parameters. An implementation of the 
simulated parameter distributions is provided in the next Matlab function. 

The charts in Figure 4 plot the simulated distribution for the sample mean and sample variance 
against the normal and Chi-squared distribution given by the theory and the Gaussian distribution. In 
both cases the simulated distributions are very close to their theoretical limits. The confidence levels 
and distributions of individual estimated parameters only tell part of the story. For example, in value- 
at-risk calculations, one would be interested in finding a particular percentile of the asset distribution. 
In expected shortfall calculations, one would be interested in computing the expectation of the asset 



12 Even without the assumption of Gaussian log returns increments, but knowing only that log returns increments are 
iid, one can reach a similar Gaussian limit for C n , but only asymptotically in n, thanks to the central limit theorem. 
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Code 3 M AT LAB® Simulating distribution for Sample mean and variance. 

'/.load 'C:\Research\QS CVO Technical Paper \ Report \ Chapter 1 \ f t se 1 OOmonthly 1940 . mat : 

load ' f tsel00monthlyl940 . mat ' 

S=f tsel00monthlyl940data; 

ret = log (S (2: end ,2) . /S (1 : end-1 ,2) ) ; 

n= length C ret ) 

m= sum C ret ) /n 

v = sqrt C sum ((ret-m)."2)/n) 

SimR=normrnd Cm,v , 10000 ,n) ; 

m_dist=sum(SimR ,2)/n; 
m_tmp = repmat Cm_dist , 1 , 799) ; 
v_dist = sum C ( SimR -m_tmp ) . ~ 2 , 2) /n ; 

dt=l/12; 

sigma_dist=sqrt (v_dist/dt) ; 

mu_dist = m_dist/dt+0.5*sigma_dist.~2; 

S0=100 ; 

pct_dist=S0*logninv (0.99 ,sigma_dist , mu_di s t ) ; 
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Figure 4: Distribution of Sample Mean and Variance 



conditional on exceeding such percentile. In the case of the GBM, one knows the marginal distribution 
of the asset price to be log normal and one can compute percentiles analytically for a given parameter 
set. The MATLAB code 2.2, which computes the parameter distributions, also simulates the distribution 
for the 99th percentile of the asset price after three years when re-sampling from simulation according to 
the first estimated parameters. The histogram is plotted in Figure 5. 

2.3 Characteristic and Moment Generating Functions 

Let A be a random variable, either discrete with probability px (first case) or continuous with density 
fx (second case). The k — th moment mfc(A) is defined as: 



m k (X) =E(X k ) 



lT 00 x k f x {x)dx. 



Moments are not guaranteed to exist, for example the Cauchy distribution does not even admit the first 
moment (mean). The moment generating function Mx(t) of a random variable A is defined, under some 
technical conditions for existence, as: 



M{t) = E(e 



E x e tx Px(x) 
lZ o e^fx(x)dx 



D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 



12 



0.035 




130 140 150 160 170 180 190 200 

99th Percentile with S0=1 00 over a three year period 



Figure 5: Distribution for 99th Percentile of FTSE100 over a three year Period 

and can be seen to be the exponential generating function of its sequence of moments: 

Mx(t)=f:^tK (20) 

fc=0 

The moment generating function (MGF) owes its name to the fact that from its knowledge one may back 
out all moments of the distribution. Indeed, one sees easily that by differentiating j times the MGF and 
computing it in one obtains the j-th moment: 

d? 

—M x (t)\ t=0 = m 3 (X). (21) 
A better and more general definition is that of characteristic function, defined as: 

fe(<)=E(e ia ), 

where i is the unit imaginary number from complex numbers theory. The characteristic function exists for 
every possible density, even for the Cauchy distribution, whereas the moment generating function does not 
exist for some random variables, notably some with infinite moments. If moments exist, one can derive 
them also from the characteristic function through differentiation. Basically, the characteristic function 
forces one to bring complex numbers into the picture but brings in considerable advantages. Finally, it 
is worth mentioning that the moment generating function can be seen as the Laplace transform of the 
probability density, whereas the characteristic function is the Fourier transform. 

2.4 Properties of Moment Generating and Characteristic Functions 

If two random variables, X and Y, have the same moment generating function M(t), then X and Y have 
the same probability distribution. The same holds, more generally, for the characteristic function. 

If X and Y are independent random variables and W — X + Y , then the moment generating function 
of W is the product of the moment generating functions of X and Y: 

M x+Y (t) = E(e t(x+y) ) = E(e tx e tY ) = E(e tx )E(e tY ) = M x (t)M Y (t). 

The same property holds for the characteristic function. This is a very powerful property since the sum 
of independent returns or shocks is common in modelling. Finding the density of a sum of independent 
random variables can be difficult, resulting in a convolution, whereas in moment or characteristic function 
space it is simply a multiplication. This is why typically one moves in moment generating space, computes 
a simple multiplication and then goes back to density space rather than working directly in density space 
through convolution. 
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As an example, if X and Y are two independent random variables and X is normal with mean /xi and 
variance o\ and Y is normal with mean [ii and variance erf, then one knows that W = X + Y is normal 
with mean ii\ + /i 2 and variance cr 2 + o\. In fact, the moment generating function of a normal random 
variable Z ~ J\f(f/,, cr 2 ) is given by: 

M z (t) = 

Hence 2 

Mjr(t) = e^ 11 ^^* 2 and My(t) = e" !f+ T' ! 

From this 

M ff (i) = M x (t)M Y (t) = e ^i*+#* 2 e M2t+^t 2 = e (Pi+P 2 )t+^4^* 2 

one obtains the moment generating function of a normal random variable with mean (fix + /^ 2 ) and 
variance [u\ + erf), as expected. 



3 Fat Tails: GARCH Models 

GARCH 13 models were first introduced by ?. The assumption underlying a GARCH model is that 
volatility changes with time and with the past information. In contrast, the GBM model introduced in 
the previous section assumes that volatility (a) is constant. GBM could have been easily generalised 
to a time dependent but fully deterministic volatility, not depending on the state of the process at any 
time. GARCH instead allows the volatility to depend on the evolution of the process. In particular the 
GARCH(1,1) model, introduced by Bollerslev, assumes that the conditional variance (i.e. conditional on 
information available up to time ti) is given by a linear combination of the past variance cr(^_i) 2 and 
squared values of past returns. 



= fiAti + <j(U)AW(U 



S(U) 

a(U) 2 = ua 2 + aa{t^xf + /3e(t 4 _i) a 

e(U) 2 = (a(ti)AW(ti)) 2 (22) 
where AS'(ij) = S(t i+ i) — S(ti) and the first equation is just the discrete time counterpart of Equation 

If one had just written the discrete time version of GBM, a(U) would just be a known function of time 
and one would stop with the first equation above. But in this case, in the above GARCH model cr(t) 2 is 
assumed itself to be a stochastic process, although one depending only on past information. Indeed, for 
example 

a(t 2 ) 2 = uja 2 + (3e{ti) 2 + auja 2 + a 2 a{t ) 2 

so that ufo) depends on the random variable e{t\) 2 . Clearly conditional on the information at the 
previous time t\ the volatility is no longer random, but unconditionally it is, contrary to the GBM 
volatility. 

In GARCH the variance is an autoregressive process around a long-term average variance rate a 2 . 
The GARCH(1,1) model assumes one lag period in the variance. Interestingly, with uj = and (3 = 1 — a 
one recovers the exponentially weighted moving average model, which unlike the equally weighted MLE 
estimator in the previous section, puts more weight on the recent observations. The GARCH(1,1) model 
can be generalised to a GARCH (p, q) model with p lag terms in the variance and q terms in the squared 
returns. The GARCH parameters can be estimated by ordinary least squares regression or by maximum 
likelihood estimation, subject to the constraint u + a + [3 = 1, a constraint summarising of the fact that 
the variance at each instant is a weighted sum where the weights add up to one. 

As a result of the time varying state-dependent volatility, the unconditional distribution of returns is 
non-Gaussian and exhibits so called "fat" tails. Fat tail distributions allow the realizations of the random 
variables having those distributions to assume values that are more extreme than in the normal/ Gaussian 



13 Generalised Autoregressive Conditional Hetcroscedasticity 
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Figure 6: Historical FTSE100 Return Distribution vs. NGARCH Return Distribution 



case, and are therefore more suited to model risks of large movements than the Gaussian. Therefore, 
returns in GARCH will tend to be "riskier" than the Gaussian returns of GBM. 

GARCH is not the only approach to account for fat tails, and this report will introduce other ap- 
proaches to fat tails later on. 

The particular structure of the GARCH model allows for volatility clustering (autocorrelation in the 
volatility), which means a period of high volatility will be followed by high volatility and conversely 
a period of low volatility will be followed by low volatility. This is an often observed characteristic 
of financial data. For example, Figure 8 shows the monthly volatility of the FTSE100 index and the 
corresponding autocorrelation function, which confirms the presence of autocorrelation in the volatility. 

A long line of research followed the seminal work of Bollerslev leading to customisations and general- 
isations including exponential GARCH (EGARCH) models, threshold GARCH (TGARCH) models and 
non-linear GARCH (NGARCH) models among others. In particular, the NGARCH model introduced by 
? is an interesting extension of the GARCH (1,1) model, since it allows for asymmetric behaviour in the 
volatility such that "good news" or positive returns yield subsequently lower volatility, while "bad news" 
or negative returns yields a subsequent increase in volatility. The NGARCH is specified as follows: 



A5(ti) 



= tiAti + a(ti)AW(ti 



S(U) 

a{U) 2 = cj + aa(i l _i) 2 +/3(e(t i -i)- 7 a(i l _i)) 2 

e(U) 2 = {a(U)AW(t t )) 2 (23) 

NGARCH parameters to, a, and j3 are positive and a, (3, and 7 are subject to the stationarity 
constraint a + (3{\ + 7 2 ) < 1. 

Compared to the GARCH(1,1), this model contains an additional parameter 7, which is an adjustment 
to the return innovations. Gamma equal to zero leads to the symmetric GARCH(1,1) model, in which 
positive and negative e(U-i) have the same effect on the conditional variance. In the NGARCH 7 is 
positive, thus reducing the impact of good news (e(tj_i) > 0) and increasing the impact of bad news 
(e(ij_i) < 0) on the variance. 

MATLAB Routine 4 extends Routine 1 by simulating the GBM with the NGARCH volatility struc- 
ture. Notice that there is only one loop in the simulation routine, which is the time loop. Since the 
GARCH model is path dependent it is no longer possible to vectorise the simulation over the time step. 
For illustration purpose, this model is calibrated to the historic asset returns of the FTSE100 index. In 
order to estimate the parameters for the NGARCH geometric Brownian motion the maximum-likelihood 
method introduced in the previous section is used. 



D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 



15 



Market News 

I I I I * I I I I I 
Os———— ■ i mi — — I ii li mn i ———a— n ii m i i 

i i i i i i i i * i 

100 200 300 400 500 600 700 800 900 

4 years of daily simulation 

GBM Log returns Volatility with GARCH effects 

0-1 i I I i i i i i i I 

f 0.08 - 
S 0.06 - 
| 0.04 - 
3 0.02 





0.1 — 
§ 0.08 - 
I 0.06 - 
| 0.04 - 
5 0.02 



100 400 500 600 700 

4 years of daily simulation 
GBM Log returns Volatility with N-GARCH effects 



kjaJ 




400 500 600 

4 years of daily simulation 



GBM Log returns 




400 500 600 

4 years of daily simulation 
GBM Log returns with GARCH effects 



400 500 600 

4 years of daily simulation 



800 900 1000 




800 900 1000 



Figure 7: NGARCH News Effect vs GARCH News Effect 



Although the variance is changing over time and is state dependent and hence stochastic conditional on 
information at the initial time, it is locally deterministic, which means that conditional on the information 
at the variance is known and constant in the next step between time ti~\ and U. In other words, 
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Figure 8: Monthly Volatility of the FTSE100 and Autocorrelation Function 



Code 4 MAT LAB® Code to Simulate GBM with NGARCH vol. 



function S = NGARCH_ s imul at ion ( N_Sim ,T,params ,S0) 

mu_ = params CI) ; omega_=params (2) ; alpha_=params (3) ; beta_=params (4) ; gamma_ = params C 5) ; 

S= SO* ones (N_Sim , T) ; 

vO=omega_/(l-alpha_-beta_); 

v = v0*ones (N_Sim , 1) ; 

BM=normrnd (0,l,K_Sim,T); 

for i=2:T 

s igma_ = sqrt (v) ; 
mean=(mu_-0.5*sigma_ ."2) ; 

S ( : , i)=S ( : ,i-l) .*exp(mean+sigma_ . *BM ( : ,i)) ; 

v=omega_+alpha_*v+beta_*(sqrt(v) .*BMC: ,i)- gamma_ *sqrt(v)) ."2; 

end 

end 



the variance in the GARCH models is locally constant. Therefore, conditional on the variance at 
the density of the process at the next instant is still normal 14 . Moreover, the conditional returns are still 
independent. The log-likelihood function for the sample x\, Xi, x n of: 

x . _ AS(ti) 



S(U) 



is given by: 

n 

C*(&) = ]Tlog/e(^-i) (24) 



i=l 



1 ( (Xt-fJ-Y 



feixf, Xi^i) = f N {xi\^, g-j ) = — ^ exp — (25) 

C* (9) = constant + - (- \og{af) - (x t - itf I of) (26) 

where /e is the density function of the normal distribution. Note the appearance of Xi-\ in the func- 
tion, which indicates that normal density is only locally applicable between successive returns, as noticed 
previously. The parameter set for the NGARCH GBM includes five parameters, 9 = (fi, to, a, f3, 7). Due 
to the time changing state dependent variance one no longer finds explicit solutions for the individual 
parameters and has to use a numerical scheme to maximise the likelihood function through a solver. To 
do so one only needs to maximise the term in brackets in (26), since the constant and the factor do not 
depend on the parameters. 



14 Note that the unconditional density of the NGARCH GBM is no longer normal, but an unknown heavy tailed distri- 
bution. 
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Code 5 MAT LAB® Code to Estimate Parameters for GBM NGARCH. 



function [mu_ omega_ alpha_ beta_ gamma_ ] = KG_ cal ibr at i on C dat a , params ) 
returns = pr i ce 2r e t ( dat a ) ; 
returnsLength= length C returns ) ; 

options = optimset ( ' MaxFunEvals ' , 100000, 'Maxlter', 100000); 
f mins ear ch C @NG_ JGBM_LL , params, options); 
function mil = NG_ JGBM_LL ( par ams ) 

mu_ = params CI) ; omega_ = abs (params (2) ) ; alpha_ = abs (par ams (3) ) ; 

beta_ = abs (params (4)) ; gamma_ = par ams ( 5 ) ; 

denum = 1 - alpha_ - bet a_ * ( 1 + gamma_ " 2 ) ; 

if denum < 0; mil = intmax ; return; end /^Variance stationarity test 
var_t = omega_ /denum ; 
innov = returns (1) -mu_ ; 

LogLikelihood = log ( exp ( - innov ~ II var_t II ) / sqrt (pi *2* var_t ) ) ; 
for time=2 : returnsLength 

var_t = omega_ + alpha. * var_t +bet a_ *( innov - gamma_ * sqrt ( var_t ))" 2 ; 

if var_t<0; mil = intmax; return; end; 

innov = returns (time ) -mu_ ; 

LogLikelihood = LogLikelihood + log ( exp ( - innov " 21 var_t /2) / sqrt ( pi *2* var _t ) ) ; 

end 

mil = -LogLikelihood; 

end 

end 



Having estimated the best fit parameters one can verify the fit of the NGARCH model for the FTSE100 
monthly returns using the QQ-plot. A large number of monthly returns using Routine 5 is first simulated. 
The QQ plot of the simulated returns against the historical returns shows that the GARCH model 
does a significantly better job when compared to the constant volatility model of the previous section. 
The quantiles of the simulated NGARCH distribution are much more aligned with the quantiles of the 
historical return distribution than was the case for the plain normal distribution (compare QQ-plots in 
Figures 2 and 6). 

4 Fat Tails: Jump Diffusion Models 

Compared with the normal distribution of the returns in the GBM model, the log-returns of a GBM model 
with jumps are often leptokurtotic 15 . This section discusses the implementation of a jump-diffusion model 
with compound Poisson jumps. In this model, a standard time homogeneous Poisson process dictates 
the arrival of jumps, and the jump sizes are random variables with some preferred distribution (typically 
Gaussian, exponential or multinomial). ? applied this model to option pricing for stocks that can be 
subject to idiosyncratic shocks, modelled through jumps. The model SDE can be written as: 

dS(t) = nS{t)dt + aS(t)dW(t) + S(t)dJ t (27) 
where again Wt is a univariate Wiener process and Jt is the univariate jump process defined by: 

J t = - 1), or dJ(t) = (Y N(t) - l)dN(t), (28) 

j=i 

where (JVt)t>o follows a homogeneous Poisson process with intensity A, and is thus distributed like a 
Poisson distribution with parameter AT. The Poisson distribution is a discrete probability distribution 
that expresses the probability of a number of events occurring in a fixed period of time. Since the Poisson 
distribution is the limiting distribution of a binomial distribution where the number of repetitions of the 
Bernoulli experiment n tends to infinity, each experiment with probability of success A/n, it is usually 
adopted to model the occurrence of rare events. For a Poisson distribution with parameter AT, the 
Poisson probability mass function is: 

cxp(-AT)(ATr 

Jp{x,\T) = j , 35 = 0,1,2,... 

a;! 



Fat-tailed distribution. 
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(recall that by convention 0! = 1). The Poisson process (Nt)t^o is counting the number of arrivals in 
[1, T], and Yj is the size of the j'-th jump. The Y are i.i.d log-normal variables (Yj ~ exp (j\f Uiy, °y))) 
that are also independent from the Brownian motion W and the basic Poisson process N . 

Also in the jump diffusion case, and for comparison with (2), it can be useful to write the equation 
for the logarithm of S, which in this case reads 

d\ogS(t) = ffj,- icr 2 ^j dt + odW(t) + \og{Y N{t) )dN(t) or, equivalently (29) 
dlog5(t) = (n+ XfJ-y - -cr 2 ^ dt + <rdW(t) + [log(Y N(t) )dN(t) - p Y Xdt] 

where now both the jump (between square brackets) and diffusion shocks have zero mean, as is shown 
right below. Here too it is convenient to do MLE estimation in log-returns space, so that here too one 
defines X(ti)'s according to (10). 

The solution of the SDE for the levels S is given easily by integrating the log equation: 

N(T) 

(30) 



S(T) = 5(0) exp (j^-^jT+ aW(T^j JJ Y 3 



Code 6 M AT LAB® Code to Simulate GBM with Jumps. 



function S = JGBM_ s imul at ion C N_Sim , T , dt , par ams , SO ) 

mu_star=params CI) ; s igma_ =params (2) ; lambda_ =params (3) ; 
mu_y_=params (4) ; sigma_y_ =params C5) ; 
M_simul = zero s ( N_Sim , T ) ; 
for t=l:T 

jumpnb = po issrnd ( lambda_ *dt , N_Sim , 1 ) ; 

j ump = normrndCmu_y_*(j umpnb - lamb da _ * dt ) , sqrt ( j umpnb )*sigma_y_) ; 
M_simul(:,t) = mu_star*dt + s igma_ * sqrt C dt ) * randn ( N_Sim , 1 ) + jump; 

end 

S=ret2price (M_simul ' ,S0) ' ; 

end 



The discretisation of the Equation (30) for the given time step At is: 



S(t) = S(t - At) exp ^ i^fi - y J At + o-VAietJ J[ Y o ( 31 ) 

where e ~ M (0, 1) and n t = N t — Nt-At counts the jumps between time t — At and t. When rewriting 
this equation for the log returns 

X{t) := Alog(S(t)) = log(S(t)) - log(S(* - At)) we find: 

X(t) = A log(S(*)) = fi* At + aVAl s t + AJ* (32) 

where the jumps A J t * in the time interval At and the drift /i* are defined as: 

— ( 1 \ 

A J* = log(^) - A At p Y , M * = U + A M y - -a 2 (33) 



so that the jumps A J t * have zero mean. Since the calibration and simulation are done in a discrete time 
framework, we use the fact that for the Poisson process with rate A, the number of jump events occurring 
in the fixed small time interval At has the expectation of AAi. Compute 

E(AJ*) = E ^X^log^j -A At hy =E ]E ^^logY,|n f j j - A At py 
= E (ntfiy) — A At py = 
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Figure 9: The Effect of the Jumps on the Simulated GBM path and on the Density Probability of the 
Log Returns, fi* = 0, a = 0.15, A = 10, fi Y = 0.03, ay = 0.001 



The simulation of this model is best done by using Equation (32). 

Conditional on the jumps occurrence n tl the jump disturbance AJ f * is normally distributed 

Aj;|„ f ~Af({n t - \At)nY,n t o- Y ) . 

Thus the conditional distribution of X(i) — A\og(S(t)) is also normal and the first two conditional 
moments are: 



E(Alog(S(i))|n t ) 
Var(Alog(5(t))|n t ) 



= n*At + (n t - A At)fi Y = (m - o- 2 /2) At + n t fi Y , 
— a 2 At + nt<J Y 



(34) 
(35) 



The probability density function is the sum of the conditional probabilities density weighted by the 
probability of the conditioning variable: the number of jumps. Then the calibration of the model param- 
eters can be done using the MLE on log-returns X conditioning on the jumps. 
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Code 7 MATLAB® Code to Calibrate GBM with Jumps. 



function [mu_star s igma_ lambda_ mu_y_ sigma_y_] = JGBM_calibration(data,dt,params) 
returns = pr i ce 2r e t ( dat a ) ; 
dataLength=length(data) ; 

options = optimset ( ' MaxFunEvals ' , 100000, 'Maxlter', 100000); 
params = f minse ar ch ( @FT_ JGBM_LL , params , options ) ; 
mu_ st ar =params CD ; si gma_ = abs (params (2)) ; lamb da _ = abs (params (3)) ; 
mu_y_=params (4) ; sigma_y_ =abs (params (5) ) ; 

function mil = FT_ JGBM_LL ( par ams ) 

mu_ = params (1) ; sigma_ = abs (params (2)) ; lambda_ = abs (params (3)) ; 
mu_y_ = params (4) ; si gma_y_=abs (params (5)) ; 
Max_ j umps = 5 ; 

factoriel = factorial(0:Max_j umps ) ; 
LogLikelihood = 0; 
for time=l:dataLength 
ProbDensity = 0; 

for j umps =0 : Max_ j umps - 1 

j umpProb = exp(-lambda_*dt )*(lambda_*dt) " j umps /factoriel(jumps + l) ; 

condVol = dt*sigma_~2+jumps*sigma_y_~2; 

condMean = mu_ * dt + mu_y_ * ( j umps - lambda_ * dt ) ; 

condToJumps = exp (-(data(time) -condMean ) "2/ condVol/2)/sqrt (pi*2*condVol ) ; 
ProbDensity = ProbDensity + j umpProb *condToJumps ; 

end 

LogLikelihood = LogLikelihood + log(ProbDensity); 

end 

mil = -LogLikelihood; 

end 

end 



argmax £* = log(£) (36) 

/j-*,/iv,A>0,o->0,cr^>0 

The log- likelihood function for the returns X(t) at observed times t = ti, . . . ,i n with values x± t . . . , x n , 
with At = ti — ti— i, is then given by: 

Th 

£*(©) = ^2 l °gf(xi;V,VY,\,cr,VY) (37) 
i=i 

/ (xi;fj,,[iY, A, cr, cry) = ^2,P{nt = j)fu (ju - cr 2 /2) At + j^ Y ,cr 2 At + ju Y ) (38) 

3=0 

This is an infinite mixture of Gaussian random variables, each weighted by a Poisson probability 
P(nt = j) = fp(j; AAt). Notice that if At is small, typically the Poisson process jumps at most once. In 
that case, the above expression simplifies in 

f(xi]n,ii Y ,X,<T,oy) = (l-\At)f Af (x l ;(fi-cj 2 /2)At,a 2 At) 

+ X At fx {xr, (fi - a 2 j2) At + [i Yl a 2 At + a\) 

i.e. a mixture of two Gaussian random variables weighted by the probability of zero or one jump in At. 

The model parameters are estimated by MLE for FTSE100 on monthly returns and then different 
paths are simulated. The goodness to fit is seen by showing the QQ-plot of the simulated monthly log 
returns against the historical FTSE100 monthly returns in Fig. 10. This plot shows that the Jumps 
model fits the data better then the simple GBM model. 

Finally, one can add markedly negative jumps to the arithmetic Brownian motion return process (29), 
to then exponentiate it to obtain a new level process for S featuring a more complex jump behaviour. 
Indeed, if [iy is larger than one and a\r is not large, jumps in the log-returns will tend to be markedly 
positive only, which may not be realistic in some cases. For details on how to incorporate negative jumps 
and on the more complex mixture features this induces, the reader can refer to Section 10, where the 
more general mean reverting case is described. 
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Figure 10: Historical FTSE100 Return Distribution vs. Jump GBM Return Distribution 



5 Fat Tails Variance Gamma (VG) process 

The two previous sections have addressed the modelling of phenomena where numerically large changes 
in value ( "fat tails" ) are more likely than in the case of the normal distribution. This section presents 
yet another method to accomplish fat tails, and this is through a process obtained by time changing a 
Brownian motion (with drift p, and volatility a) with an independent subordinator, that is an increasing 
process with independent and stationary increments. In a way, instead of making the volatility state 
dependent like in GARCH or random like in stochastic volatility models, and instead of adding a new 
source of randomness like jumps, one makes the time of the process random through a new process, the 
subordinator. A possible related interpretative feature is that financial time, or market activity time, 
related to trading, is a random transform of calendar time. 

Two models based on Brownian subordination are the Variance Gamma (VG) process and Normal 
Inverse Gaussian (NIG) process. They are a subclass of the generalised hyperbolic (GH) distributions. 
They are characterised by drift parameters, volatility parameter of the Brownian motion, and a variance 
parameter of the subordinator. 

Both VG and NIG have exponential tails and the tail decay rate is the same for both VG and NIG 
distributions and it is smaller than for the Normal distribution. 

Therefore these models allow for more flexibility in capturing fat tail features and high kurtosis of 
some historical financial distributions with respect to GBM. 

Among fat tail distributions obtained through subordination, this section will focus on the VG dis- 
tribution, that can also be thought of as a mixture of normal distributions, where the mixing weights 
density is given by the Gamma distribution of the subordinator. The philosophy is not so different from 
the GBM with jumps seen before, where the return process is also a mixture. The VG distribution was 
introduced in the financial literature by 16 ?. 

The model considered for the log-returns of an asset price is of this type: 

d log S(t) = p. dt + 9 dg(t) + a dW{g(t)) S(0) = S (39) 

where p, 9 and a are real constants and a ^ 17 . This model differs from the "usual" notation of Brownian 
motion mainly in the term g t . In fact one introduces gt to characterise the market activity time. One 
can define the 'market time' as a positive increasing random process, g(t), with stationary increments 
g(u) — g(t) for u > t > 0. In probabilistic jargon g(t) is a subordinator. An important assumption is that 
E[g(u) — g(t)] =u — t. This means that the market time has to reconcile with the calendar time between 
t and u on average. Conditional on the subordinator g, between two instants t and u: 

a( W(g(u)) - W(g(t)) )\ g ~ a^(g(u) - g(t))e, (40) 

16 For a more advanced introduction and background on the above processes refer to ??? and ?. 

17 In addition one may say that (39) is a Levy process of finite variation but of relatively low activity with small jumps. 
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where e is a standard Gaussian, and hence: 

(bg(S(«)/5(t)) - fi(u - t))\g ~ N(6(g(u) - g(t)), a 2 (g(u) - g(t))) (41) 

VG assumes {g(t)} ~ r(£,j/), a Gamma process with parameter v that is independent from the 
standard Brownian motion {W^tjt^o- Notice also that the gamma process definition implies {g{u) — 
g(t)} ~ T(^^, v) for any u and t. Increments are independent and stationary Gamma random variables. 
Consistently with the above condition on the expected subordinator, this Gamma assumption implies 
E[g(u) — g(t)] = u — t and Var(g(u) — g{t)) = v{u — t). 

Through iterated expectations, one has easily that: 

E[log(5(u)) - log(£(f))] = (/* + S)(u - t), Var(logS(u) - logS(t)) = {d 2 + 6 2 v){u - t). 

Mean and variance of these log-returns can be compared to the log-returns of GBM in Equation (3), and 
found to be the same if (p + 6) = /i — cr 2 /2 and a 2 + 9 2 v — a 2 . In this case, the differences will be visible 
with higher order moments. 

Code 8 MAT LAB® code that simulate the asset model in eq. (39). 

function S = VG_ s imul at ion C N_Sim , T , dt , par ams , SO ) 

theta = params CI) ; nu = params (2) ; sigma = params (3) ; mu = params (4) ; 
g = gamrnd C dt /nu , nu , N_Sim , T ) ; 

S = SO * cumprod ( [ones (1 ,T) ; exp(mu*dt + theta*g + s igma* sqrt ( g ) . * randn ( N_Sim , T ) ) ] ) ; 



The process in equation (39) is called a Variance- Gamma process. 



5.1 Probability Density Function of the VG Process 

The unconditional density may then be obtained by integrating out g in Eq (41) through its Gamma 
density. This corresponds to the following density function, where one sets u — t =: At and X& t '•= 

io g (s(t + M)/s(t)y. 

f(x At )(x) = f N {x\ 9g, a 2 g)f r (g; ^, vj dg (42) 

As mentioned above, this is again a (infinite) mixture of Gaussian densities, where the weights are given 
by the Gamma densities rather than Poisson probabilities. The above integral converges and the PDF of 
the VG process X& t defined in Eq. (39) is 



2e^ ( \ x -n\ \ At/ - 1/2 ns-pU^l+e^ 

- s Z^>m (7§rfy K -" A * ) <43) 

Here K v (-) is a modified Bessel function of the third kind with index 77, given for to > by 

1 f°° r x i 

k v (x) = 2j Q y^ 1 ex p {- 2 + y) i dy 



Code 9 M AT LAB® code of the variance gamma density function as in equation (43). 



function fx = VGdensityCx, theta, nu , sigma, mu , T ) 

vl = 2* exp ( ( theta* (x-mu) ) / sigma~2) / ( (nu"(T/nu)) * sqrt(2*pi) * sigma * gamma(TVnu) ); 

M2 = (2*sigma~2)/nu + theta~2; 

v3 = abs (x-mu) . /sqrt (M2) ; 

v4 = v3."(T/nu - 0.5) ; 

v6 = ( abs (x-mu) .* sqrt (M2 ))./ sigma~2 ; 

K = besselk (T/nu - 0.5, v6); 

fx = vl . * v4 . *K ; 
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Figure 11: On the left side, VG densities for v = .4, er = .9, p, = and various values of of the parameter 9. 
On the right, simulated VG paths samples. To replicate the simulation, use code (5) with the parameters 
/l = 9 = 0, v = 6 and o = .03. 



5.2 Calibration of the VG Process 

Given a time series of independent log-returns log(S(ti + i)/ S(ti)) (with ti + \—ti = At), say (xi, x%, . . . , x n ), 
and denoting by 7r = {/2, 9, a, v\ the variance-gamma probability density function parameters that one 
would like to estimate, the maximum likelihood estimation (MLE) of 7r consists in finding 7f that max- 
imises the logarithm of the likelihood function C(tt) — nf =1 /(a;i; ir). The moment generating function of 
X A t := log(S(f + At)/S(t)), E[e zX ] is given by: 

/ - 1 \-— 
M x {z) = e flz (l-9vz- -va 2 z 2 } " (44) 

and one obtains the four central moments of the return distribution over an interval of length At: 

E[X] = X = {Jx + 9)At 
E[{X-Xf] = {is9 2 +a 2 )At 
E[(X - X) 3 } = {29 3 v 2 + 3a 2 v9)At 

E[{X-Xf] = (3^ 4 + \29 2 a 2 v 2 + 69 4 v 3 )At + {3a 4 + 69 2 a 2 v + 39 4 is 2 ){At) 2 . 
Now considering the skewness and the kurtosis: 

3v9At 

K = 



((u6 2 + a 2 )At) 3 / 2 

(3w 4 + \29 2 a 2 v 2 + §9 4 v 3 )At + (3ct 4 + &9 2 a 2 v + 39 4 v 2 ){At) 2 



{{v6 2 +a 2 )At) 2 
and assuming 9 to be small, thus ignoring 9 2 , 9 3 , 9 4 , obtain: 

S 



aVAt 

K = 3(1 + ^ 
v At' 
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So the initial guess for the parameters 7Tq will be: 



V 
At 
K 



SaVAl 



3u 



X 
At 



Code 10 MAT LAB® code of the MLE for variance gamma density function in (43). 



'/□function parais = VG_cal ibr at ion C data , dt ) 
% computation of the initial parameters 



data = (EURUSDsv ( : , 1) ) ; 
hist ( dat a ) 
dt = 1 ; 

data = pr i c e2r e t C dat a) ; 



M = mean C dat a ) ; 

V = std(data) ; 

S = skewne s s C dat a ) ; 

K = kurtosis (data) ; 

sigma = sqrtCV/dt); 

nu = (K/3-1) *dt ; 

theta = (S*sigma*sqrt (dt ) ) / (3*nu) ; 
mu = (M/dt) -theta ; 

'/. VG MLE 

pdf_VG = @ C data , theta , nu, sigma, mu) VGdens i t y C dat a , theta, nu , sigma, mu , dt ) ; 

start = [theta, nu , sigma, mu] ; 

lb = [-intmax -intmax] ; 

ub = [intmax intmax intmax intmax] ; 

options = statset ( ' Maxlter ' , 1000 , ' MaxFunEvals ■ , 10000) J 

params = mleCdata, ' pdf ' , pdf _VG , ' start start , 'lower', lb, 'upper^ub, ' opt ions ', opt ions ) ; 




Figure 12: On the left side, VG densities for 6 = 1, v = .4, a — 1.4, /i = and time from 1 to 10 years. 
On the right side, on the y-axis, percentiles of these densities as time varies, x-axis. 

An example can be the calibration of the historical time series of the EUR/USD exchange rate, where 
the VG process is a more suitable choice than the geometric Brownian motion. The fit is performed 
on daily returns and the plot in Fig. 13 illustrates the results. Notice in particular that the difference 
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Code 11 MAT LAB® code to calculate the percentiles of the VG distribution as in fig (12). 



clear all; close all 

lb = 0; ub = 20; 

options = optimsetC' Display',' off', ' TolFun ' ,le-5, 'MaxFunEvals' ,2500) ; 

X = linspace (-15 ,30 ,500) ; 

s igma = 1.4; 

f = fi(x,t) VGdensity(x, 1, .4, sigma, 0, t); 
nT = 10; 

t = linspace (1 , 10 , nT) ; 

pc = [.1 .25 .5 .75 .9 .95 .99] ; 

for k = 1 : nT 

Y(:,k) = VGdensity(X, 1, .4, sigma, 0, t(k) ); 

f = @(x) VGdensityCx , 1, .4, sigma, 0, t(k)); 

VG_PDF = quadl (f , -3 ,30) ; 

for zz = l:length(pc) 

percF = (3(a) quadl (f , -3 , a) - pc(zz); 

PercA(k,zz) = f zero ( percF , 1 , opt i ons ) ; 

end 
end 

subplot (1,2,1) 
plot (X ,Y) ; 

legend ( ' lyr ' , ' 2yr ' , ' 3yr ' , ' 4yr ' , ' 5yr ' , ' 6yr ' , ' 7yr ' , ' 8yr ' , ' 9yr ' , ' lOyr ' ) 
xlim([-5 27] ) ; ylim ( [0 .32]); subplot (1 , 2 , 2) ; plot (t , PercA ) ; 
for zz = l:length(pc) 

text (t (zz) , PercA (zz ,zz) , [-C'\fontsize{9}\color{black}' , num2st r (pc(zz))}] , 'HorizontalAlignment' , . . . 
'center', 'BackgroundColor',[l 1 1]); hold on 

end 

xlim([.5 10]); xlabel (' years ') ; grid on 



between the Arithmetic Brownian motion for the returns and the VG process is expressed by the time 
change parameter v, all other parameters being roughly the same in the ABM and VG cases. 

6 The Mean Reverting Behaviour 

One can define mean reversion as the property to always revert to a certain constant or time varying 
level with limited variance around it. This property is true for an AR(1) process if the absolute value of 
the autoregression coefficient is less than one (\a\ < 1). To be precise, the formulation of the first order 
autoregressive process AR(1) is: 

x L+1 = ri + ax t + ae t+ i:Ax t+ i = (1 - a) ( rZT^ ~ x< ^j + cr£ *+ 1 ( 45 ) 

All the mean reverting behaviour in the processes that are introduced in this section is due to an 
AR(1) feature in the discretised version of the relevant SDE. 

In general, before thinking about fitting a specific mean reverting process to available data, it is 
important to check the validity of the mean reversion assumption. A simple way to do this is to test 
for stationarity. Since for the AR(1) process |a| < 1 is also a necessary and sufficient condition for 
stationarity, testing for mean reversion is equivalent to testing for stationarity. Note that in the special 
case where a = 1, the process behaves like a pure random walk with constant drift, Axt+i = M + cre t+i- 

There is a growing number of stationarity statistical tests available and ready for use in many econo- 
metric packages. 18 The most popular are: 

• the ?, DF test; 

• the Augmented DF (ADF) test of ? test; 



For example Stata, SAS, R, E- Views, etc. 
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Figure 13: On the top chart the historical trend of the EUR/USD is reported. In the middle chart the 
daily return distribution presents fat tails with respect to a normal distribution. The MLE parameters 
for the normal R.V. are /i = —.0002 and a — 0.0061. The QQ-plot in the middle shows clearly the 
presence of tails more pronounced than in the normal case. The bottom chart is the plot of the daily 
return density against the estimated VG one, and the QQ-plot of the fitted daily historical FX return 
time series EUR/USD using a VG process with parameters 9 = —.0001, v = 0.4, a = 0.0061, /2 = —.0001. 

• the ?, PP; 

• the Variance Ratio (VR) test of ? and ?; and 

• the ?, KPSS test. 

All tests give results that are asymptotically valid, and are constructed under specific assumptions. 

Once the stationarity has been established for the time series data, the only remaining test is for the 
statistical significance of the autoregressive lag coefficient. A simple Ordinary Least Squares of the time 
series on its one time lagged version (Equation (45)) allows to assess whether a is effectively significant. 

Suppose that, as an example, one is interested in analysing the GLOBOXX index history. Due to the 
lack of Credit Default Swap (CDS) data for the time prior to 2004, the underlying GLOBOXX spread 
history has to be proxied by an Index based on the Merrill Lynch corporate indices 19 . This proxy index 
has appeared to be mean reverting. Then the risk of having the CDS spreads rising in the mean term to 
revert to the index historical "long-term mean" may impact the risk analysis of a product linked to the 
GLOBOXX index. 

The mean reversion of the spread indices is supported by the intuition that the credit spread is linked 
to the economic cycle. This assumption also has some support from the econometric analysis. To illus- 
trate this, the mean reversion of a Merrill lynch index, namely the EMU Corporate A rated 5-7Y index 20 , 
is tested using the Augmented Dickey-Fuller test (?). 



19 Asset Swap Spreads data of a portfolio of equally weighted Merrill Lynch Indices that match both the average credit 
quality of GLOBOXX, its duration, and its geographical composition. 

20 Bloomberg Tickers ER33 - Asset Swap Spreads data from 01/98 to 08/07. 
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The ADF test for mean reversion is sensitive to outliers 21 . Before testing for mean reversion, the 
index data have been cleaned by removing the innovations that exceed three times the volatility, as they 
are considered outliers. Figure 15 shows the EMU Corporate A rated 5-7Y index cleaned from outliers 
and shows also the autocorrelations and partial autocorrelation plots of its log series. From this chart 
one can deduce that the weekly log spread of the EMU Corporate A rated 5-7Y index shows a clear 
AR(1) pattern. The AR(1) process is characterised by exponentially decreasing autocorrelations and by 
a partial autocorrelation that reaches zero after the first lag (see discussions around Equation (8) for the 
related definitions): 

ACF(k)=a k , k = 0,1,2,...; PACF(k) = a, k = 1; PACF{k) = 0, k = 2, 3,4, ... (46) 

The mean reversion of the time series has been tested using the Augmented Dickey- Fuller test (?). 
The ADF statistics obtained by this time series is -3.7373. The more negative the ADF statistics, the 
stronger the rejection of the unit root hypothesis 22 . The first and fifth percentiles are -3.44 and -2.87. 
This means that the test rejects the null hypothesis of unit root at the 1% significance level. 

Having discussed AR(1) stylised features in general, this tutorial now embarks on describing specific 
models featuring this behaviour in continuous time. 

7 Mean Reversion: The Vasicek Model 

The Vasicek model, owing its name to ?, is one of the earliest stochastic models of the short-term interest 
rate. It assumes that the instantaneous spot rate (or "short rate") follows an Ornstein-Uhlenbeck process 
with constant coefficients under the statistical or objective measure used for historical estimation: 

dx t = a{9 - x t )dt + adW t (47) 

21 For an outlier due to a bad quote, the up and down movement can be interpreted as a strong mean reversion in that 
particular data point and is able to influence the final result. 

22 Before cleaning the data from the outliers the ADF statistic was -5.0226. This confirms the outliers effect on amplifying 
the mean reversion detected level. 
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Variance ot the stationary process AR(1 ) (in bleu) v.s. Variance of the Random Walk (in red) 
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Figure 14: The Random walk vs the AR(1) stationary process. (AR(1): /j, = 0, a = 0.7, a = 1 and 
Random Walk: fi = 0, a = 1) 
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Weekly spread of the EMU Corporate A rated 5-7Y index 
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Figure 15: On the top: the EMU Corporate A rated 5-7Y weekly index from 01/98 to 08/07. On the 
bottom: the ACF and PACF plot of this time series. 



EMU Corporate A rated 5-7Y index Histogram 
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Figure 16: The histogram of EMU Corporate A rated 5-7Y index. 
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with a,9 and a positive and dWt a standard Wiener process. Under the risk neutral measure used for 
valuation and pricing one may assume a similar parametrisation for the same process but with one more 
parameter in the drift modelling the market price of risk; see for example ?. 

This model assumes a mean-reverting stochastic behaviour of interest rates. The Vasicek model has 
a major shortcoming that is the non null probability of negative rates. This is an unrealistic property for 
the modelling of positive entities like interest rates or credit spreads. The explicit solution to the SDE 
(47) between any two instants s and t, with < s < t, can be easily obtained from the solution to the 
Ornstcin-Uhlcnbeck SDE, namely: 

x t =e(l- e - Q (*- s )) + x s e- a{t - s) + ae- at J e au dW u (48) 

The discrete time version of this equation, on a time grid = to, ti, t^, ■ ■ ■ with (assume constant for 
simplicity) time step At = ti — ti—\ is: 



x{U) = c + bx(U- 1 )+5e(t i ) (49) 

where the coefficients are: 

c = 9(l-e~ aAt ) (50) 

b = e- aAt (51) 

and e(t) is a Gaussian white noise (e ~ N (0, 1)). The volatility of the innovations can be deduced by 
the Ito isometry 

5 = aJ(l - e - 2aAt ) /2a (52) 

whereas if one used the Euler scheme to discretise the equation this would simply be a^/At, which is 
the same (first order in At) as the exact one above. 

Equation (49) is exactly the formulation of an AR(1) process (see section 6). Having < b < 1 when 
< a implies that this AR(1) process is stationary and mean reverting to a long-term mean given by 9. 
One can check this also directly by computing mean and variance of the process. As the distribution of 
x(t) is Gaussian, it is characterised by its first two moments. The conditional mean and the conditional 
variance of x(t) given x(s) can be derived from Equation (48): 

¥. s [x t )=e + {x s -e)e- a ^ (53) 

Var,N = ^(l-e- 2a(t - s) ) (54) 

One can easily check that as time increases the mean tends to the long-term value 9 and the variance 
remains bounded, implying mean reversion. More precisely, the long-term distribution of the Ornstein- 
Uhlenbeck process is stationary and is Gaussian with 9 as mean and ^a 2 /2a as standard deviation. 



To calibrate the Vasicek model the discrete form of the process is used, Equation (49). The coefficients 
c, b and 6 are calibrated using the equation (49). The calibration process is simply an OLS regression of 
the time series x(£j) on its lagged form x(ti—%). The OLS regression provides the maximum likelihood 
estimator for the parameters c, b and S. By resolving the three equations system one gets the following 
a, 9 and a parameters: 

a = -\n{b)/At (55) 
9 = c/(l-6) (56) 
a = 5/^/(b 2 - l)At/21n(fe) (57) 

One may compare the above formulas with the estimators derived directly via maximum likelihood 
in ?, where for brevity x(ti) is denoted by Xj and n is assumed to be the number of observations. 
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Figure 17: The distribution of the long-term mean and long-term variance and the 99th percentile 
estimated by several simulations of a Vasicek process with a — 8, 9 = 50 and a = 25 
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b = 




(58) 



En 9 /\-^n \2 




(59) 



1 n 



i=l 



— frxi-i — 9(1 — b) 



— n 2 



(60) 



From these estimators, using (55) and (57) one obtains immediately the estimators for the parameters 
a and cr. 

Code 12 MAT LAB® Calibration of the Vasicek process. 

function ML_ Vas i c ekParams = Vas i cek.cal ibr at i on ( V_dat a , dt ) 
N=length (V_data) ; 

x = [ones (N-l , 1) V_dat a ( 1 : K - 1 ) ] ; 
ols = (x'*x)~(-l)*(x'*V_data(2:N)); 
resid = V_data(2:N) - Hols; 

c=ols (1) ; 
b=ols (2) ; 
delt a= st d C re s id ) ; 

alpha=-log(b)/dt ; 
theta=c/(l-b) J 

sigma=delta/sqrt ((b"2-l)*dt/(2*log(b))) ; 
ML_VasicekParams = [alpha theta sigma]; 



For the Monte Carlo simulation purpose the discrete time expression of the Vasicek model is used. 
The simulation is straightforward using the calibrated coefficients a, 9 and cr. One only needs to draw 
a random normal process et and then generate recursively the new Xt t process, where now ti are future 
instants, using equation (49), starting the recursion with the current value xq. 



8 Mean Reversion: The Exponential Vasicek Model 

To address the positivity of interest rates and ensuring their distribution to have fatter tails, it is possible 
to transform a basic Vasicek process y(t) in a new positive process x(t). 

The most common transformations to obtain a positive number x from a possibly negative number y 
are the square and the exponential functions. 

By squaring or exponentiating a normal random variable y one obtains a random variable x with fat 
tails. Since the exponential function increases faster than the square and is larger (exp(y) >> y 2 for 
positive and large y) the tails in the exponential Vasicek model below will be fatter than the tails in the 
squared Vasicek (and CIR as well). In other terms, a lognormal distribution (exponential Vasicek) has 
fatter tails than a chi-squared (squared Vasicek) or even nonccntral chi-squared (CIR) distribution. 

Therefore, if one assumes y to follow a Vasicek process, 



then one may transform y. If one takes the square to ensure positivity and define x(t) = y(t) , then by 
Ito's formula x satisfies: 



dy t = a(9 — y t )dt + adW t 



(61) 



dx(t) = (B + A 





for suitable constants A, B, C and v. This "squared Vasicek" model is similar to the CIR model that the 
subsequent section will address, so it will not be addressed further here. 
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Figure 18: EMU Corporate A rated 5-7Y index (in dark black) vs two simulated Exponential Vasicek 
paths (in blue and green). The dashed red lines describe the 95th percentile and the 5th percentile of 
the exponential Vasicek distribution. The dark red line is the distribution mean. The two remaining red 
lines describe one standard deviation above and below the mean. 



If, on the other hand, to ensure positivity one takes the exponential, x(t) — exp(y(t)), then one 
obtains a model that at times has been termed Exponential Vasicek (for example in ?). 
Ito's formula this time reads, for x under the objective measure: 



dxt — ax t (m — logx t )dt + ax t dWt 



(62) 



where a and a are positive, and a similar parametrisation can be used under the risk neutral measure. 
Also, m = + £-. 

In other words, this model for x assumes that the logarithm of the short rate (rather than the short 
rate itself) follows an Ornstein-Uhlenbeck process. 

The explicit solution of the Exponential Vasicek equation is then, between two any instants < s < t: 

x t = cxp 1 (l - e - a{t - s) ^ + log(a; s )e- Q(t - s) + ae~ at e au dW u ^ (63) 

The advantage of this model is that the underlying process is mean reverting, stationary in the long 
run, strictly positive and having a distribution skewed to the right and characterised by a fat tail for large 
positive values. This makes the Exponential Vasicek model an excellent candidate for the modelling of the 
credit spreads when one needs to do historical estimation. As shortcomings for the Exponential Vasicek 
model one can identify the explosion problem that appears when this model is used to describe the interest 
rate dynamic in a pricing model involving the bank account evaluation (like for the Eurodollar future 
pricing, ?). One should also point out that if this model is assumed for the short rate or stochastic default 
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intensity, there is no closed form bond pricing or survival probability formula and all curve construction 
and valuation according to this model has to be done numerically through finite difference schemes, 
trinomial trees or Monte Carlo simulation. 

The historical calibration and simulation of the Exponential Vasicek model can be done by calibrating 
and simulating its log-process y as described in the previous Section. To illustrate the calibration, the 
EMU Corporate A rated 5-7Y index described in Section 6 is used. The histogram of the index shows 
a clear fat tail pattern. In addition, the log spread process is tested to be mean reverting (see again 
Section 6). This leads to think about using the Exponential Vasicek model that has a log normal 
distribution and an AR(1) disturbance in the log prices. 

The OLS regression of the log-spread time series on its first lags (Equation (49)) gives the coefficients 
c, b and 6. Then the a, 9 and a parameters in Equations (62) are deduced using equations (55), (56) and 
(57): 

c = 0.3625 a = 4.9701 
b = 0.9054 9 = 3.8307 
5 = 0.1894 a = 1.4061 

Figure 18 shows how the EMU Corporate A rated 5-7Y index time series fits into the simulation of 
50,000 paths using the Equation for the exponential of (49) starting with the first observation of the 
index (first week of 1998). The simulated path (in blue) illustrates that the Exponential Vasicek process 
can occasionally reach values that are much higher than its 95-percentile level. 



9 Mean Reversion: The CIR Model 

Originally the model proposed by ? was introduced in the context of the general equilibrium model of 
the same authors ?. By choosing an appropriate market price of risk, the Xt process dynamics can be 
written with the same structure under both the objective measure and risk neutral measure. Under the 
objective measure one can write 

dx t = a(9 - x t )dt + a^x~ t dW t (64) 

with a, 9, a > 0, and dWt a standard Wiener process. To ensure that x% is always strictly positive one 
needs to impose a 2 ^ 2a9. By doing so the upward drift is sufficiently large with respect to the volatility 
to make the origin inaccessible to Xt- 

The simulation of the CIR process can be done using a recursive discrete version of Equation (64) at 
discretisation times U: 



x(ti) = a9At + (1 - aAi)i(i,_!) + a^Jx^x) At e u (65) 

with e ~ jV(0, 1). If the time step is too large, however, there is risk that the process approaches zero 
in some scenarios, which could lead to numerical problems and complex numbers surfacing in the scheme. 
Alternative positivity preserving schemes are discussed, for example in ?. For simulation purposes it is 
more precise but also more computationally expensive to draw the simulation directly from the following 
exact conditional distribution of the CIR process: 

fciR KK-J = ce- u ~ v (l) q/2 I q (2VH ( 66 ) 

2a 

C = <7 2 (l-e-" At ) (67) 

u = cx u e~ aAt (68) 

v = cxt i _ 1 (69) 



2a6 
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Figure 19: EMU Corporate A rated 5-7Y index (in dark black) vs two simulated CIR paths (in blue 
and green). The dashed red lines describe the 95th percentile and the 5th percentile of the distribution. 
The dark red line is the distribution mean. The two remaining red lines describe one standard deviation 
above and below the mean 
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with I q the modified Bessel function of the first kind of order q and At = ti — £j_i. The process x 
follows a noncentral chi-squared distribution x(ti)\x(ti-i) ~ \ 2 (2cxt i _ 1 , 2q + 2, 2u). 



To calibrate this model the Maximum Likelihood Method is applied: 



argmax log(£) (71) 

a>O,0>O,(T>O 

The Likelihood can be decomposed, as for general Markov processes 23 , according to Equation (9), as 
a product of transition likelihoods: 

n 

C = U fCIR (x(ti)\x(U-i);a, 9, a) (72) 

Since this model is also a fat tail mean reverting one, it can be calibrated to the EMU Corporate A 
rated 5-7Y index described in Section 6. The calibration is done by Maximum Likelihood. To speed up 
the calibration one uses as initial guess for a the AR(1) coefficient of the Vasicek regression (Equation 
(49)) and then infer 9 and a using the first two moments of the process. The long-term mean of the CIR 
process is 9 and its long-term variance is 9a 2 /(2a). Then: 

a = -Iog(6)/At (73) 
9 = E(x t ) (74) 
<j = ^/2a Var(x t )/9 (75) 



Code 13 MATLAB® MLE calibration of the CIR process. 



function ML_CIRparams = CIR_ cal ibr at i on C V_dat a , dt , params ) 

% ML_CIRparams = [alpha theta sigma] 

N=length (V_data) ; 
if nargin <3 

x = [ones(N-l.l) V_dat a ( 1 : N-i) ] ; 

ols = (x'*x)~(-l)*(x'*V_data(2:N)) ; 

m=mean CV_data) ; v=var C V_dat a ) ; 

params = [- log ( ols (2) ) / dt , m , sqrt (2*ols (2) * v/m) ] ; 

end 

options = optimset ( 1 MaxFunEvals ■ , 100000, 'Maxlter', 100000); 
ML_CIRparams = fmins ear ch C @FT_CIR_LL_Exact Full , params, options); 

function mil = FT_CIR_LL_ExactFull (params ) 

alpha = params C 1) ; teta = params (2) ; sigma = params (3); 
c = (2* alpha) / ( ( sigma "2) * ( 1 - exp (-alpha*dt ) ) ) ; 
q = ( (2*alpha*teta) / ( sigma"2) ) -1 ; 
u = c*exp(-alpha*dt)*V_data(l:N-l) ; 
v = c*V_data (2 : N) ; 

mil = - (N-l ) * log (c ) + sum (u+v-log (v . /u) *q/2 - . . . 

log(besseli(q , 2* sqrt (u.*v) ,l))-abs(real (2* sqrt (u.*v)))) ; 

end 

end 



The initial guess and fitted parameters for the EMU Corporate A rated 5-7Y index are the following: 

a Q = 0.8662 a = 1.2902 
O = 49.330 9 = 51.7894 
o-o = 4.3027 a = 4.4966 

Figure 19 shows how the EMU Corporate A rated 5-7Y index time series fits into the simulation of 
40,000 paths using Equation (65) starting with the first observation of the index (first week of 1998). 



Equation (65) shows that x(ti) depends only on x(tj_i) and not on previous values x(ii_2), 2(^—3)..., so CIR is Markov. 
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Figure 20: The tail of the EMU Corporate A rated 5-7Y index histogram (in blue) compared to the 
fitted Vasicek process probability density (in black), to the fitted Exponential Vasicek process probability 
density (in green) and to the fitted CIR process probability density (in red) 

As observed before, the CIR model has less fat tails than the Exponential Vasicek, but avoids the 
above-mentioned explosion problem of the log-normal processes bank account. Furthermore, when used 
as an instantaneous short rate or default intensity model under the pricing measure, it comes with 
closed form solutions for bonds and survival probabilities. Figure (20) compares the tail behaviour of 
the calibrated CIR process with the calibrated Vasicek and Exponential Vasicek processes. Overall, 
the exponential Vasicek process is easier to estimate and to simulate, and features fatter tails, so it is 
preferable to CIR. However, if used in conjunction with pricing applications, the tractability of CIR and 
non-explosion of its associated bank account can make CIR preferable in some cases. 

10 Mean Reversion and Fat Tails together 

In the previous sections, two possible features of financial time series were studied: fat tails and mean 
reversion. The first feature accounts for extreme movements in short time, whereas the second one 
accounts for a process that in long times approaches some reference mean value with a bounded variance. 
The question in this section is: can one combine the two features and have a process that in short time 
instants can feature extreme movements, beyond the Gaussian statistics, and in the long run features 
mean reversion? 

In this section one example of one such model is formulated: a mean reverting diffusion process with 
jumps. In other terms, rather than adding jumps to a GBM, one may decide to add jumps to a mean 
reverting process such as Vasicek. Consider: 



dx(t) = a(6 - x{t))dt + adW(t) + dJ t 



(76) 
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where the jumps J are defined as: 

N(t) 

J(t) = J2 Y i> dJ{t)=Y N[t) dN{t) 

3=1 

where N(t) is a Poisson process with intensity A and Yj are iid random variables modelling the size of 
the j-th jump, independent of N and W. 

The way the previous equation is written is slightly misleading, since the mean in the shocks in 
nonzero. To have zero-mean shocks, so as to be able to read the long-term mean level from the equation, 
one needs to rewrite it as: 

dx(t) = ol (9y — x(t)) dt + o-dW(t) + [dJ t - XE[Y] dt] (77) 

where the actual long term mean level is: 

6 Y ■= 9 + XE[Y]/a, 

and can be much larger than 9, depending on Y and A. For the distribution of the jump sizes, Gaus- 
sian jump size are considered, but calculations are possible also with exponential or other jump sizes, 
particularly with stable or infinitely divisible distributions. 

Otherwise, for a general jump size distribution, the process integrates into: 



x t =m x (x s ,t~ s) + ae- at f e az dW z + e- at f e az dJ z (78) 

J S J S 

m x (x„t - s) := 9 (l - e-°<'-') + z,e-""->, 

where m x and v x are, respectively, mean and variance of the Vasicek model without jumps. 

Notice that taking out the last term in (78) one still gets (48), the earlier equation for Vasicek. 
Furthermore, the last term is independent of all others. Since a sum of independent random variables 
leads to convolution in the density space and to product in the moment generating function space, the 
moment generating function of the transition density between any two instants is computed: 

M t | a (u)=E xW [exp(ua:(t))] (79) 

that in the model is given by 

M t]s {u) =eiq>(m x (x s ,t- s)u+ Vx ^~ ^ u 2 + ^M Y (ue- a(t ~ z) ) - l) dzj (80) 
where the first exponential is due to the diffusion part (Vasicek) and the second one comes from the 

24 

jumps . 

Now, recalling Equation (9), one knows that to write the likelihood of the Markov process it is enough 
to know the transition density between subsequent instants. In this case, such density between any two 
instants s = and t = U can be computed by the above moment generating function through inverse 
Laplace transform 25 . 

24 As stated previously, the characteristic function is a better notion than the moment generating function. For the 
characteristic function one would obtain: 

4> t]s (u) = exp {m x (x s ,t - s)iu - Vx<yt ~ S ^ u 2 + \ J ^ (flV(" e ~ Q(t_z) ) - l) dz | • (81) 

As recalled previously, the characteristic function always exists, whereas the moment generating function may not exist in 
some cases. 

25 Or better, from the above characteristic function through inverse Fourier transform. 
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Figure 21: EMU Corporate A rated 5-7Y index charts. The top chart compares the innovations fit of 
the Vasicek process to the fit of the Vasicek process with normal jumps (in red). The bottom left figure 
is the QQ-plot of the Vasicek process innovations against the historical data innovations. The bottom 
right figure is the QQ-plot of the Vasicek process with normal jumps innovations against the historical 
data innovations. 
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There are however some special jump size distributions where one does not need to use transform 
methods. These include the Gaussian jump size case, that is covered in detail now. 

Suppose then that the Jump size Y is normally distributed, with mean \iy and variance Oy. For 
small At, one knows that there will be possibly just one jump. More precisely, there will be one jump 
in(t — At, t] with probability AAt, and no jumps with probability 1 — AAt. 

Furthermore, with the approximation: 

/ e az dJ z m e Q(t ~ At) AJ(t), AJ(t) := J(t) - J(t - At). (82) 

J t-At 

one can write the transition density between s = t — At and t as: 



f x (t)\x(t-M){x) = A At f N (x; m x (At, x(t - At)) + ^y, v x (At) + a Y ) (83) 
+ (1 - XAt)f N (x;m x (At,x(t - At)),v x (At)) 

This is a mixture of Gaussians, weighted by the arrival intensity A. It is known that mixtures of 
Gaussians feature fat tails, so this model is capable of reproducing more extreme movements than the 
basic Vasicek model. 

To check that the process is indeed mean-reverting, in the Gaussian jump size case, through straight- 
forward calculations, one can prove that the exact solution of Equation (77) or equivalently Equation(76)) 
satisfies: 

E[x t ] =8 Y + (x - e Y )e- at (84) 

Var[x t ] = g2 + A( ^ + ^ ) (1 - e"**) (85) 

This way one sees that the process is indeed mean reverting, since as time increases the mean tends 
to Oy while the variance remains bounded, tending to (a 2 + X(fi Y + cr i-))/(2a). Notice that, either letting 
A or fj,y and ay going to zero (no jumps) gives back the mean and variance for the basic Vasicek process 
without jumps. 

It is important that the jump mean /iy not be too close to zero, or one gets again almost symmetric 
Gaussian shocks that are already there in Brownian motion and the model does not differ much from 
a pure diffusion Vasicek model. On the contrary, it is best to assume fj, Y to be significantly positive 
or negative with a relatively small variance, so the jumps will be markedly positive or negative. If one 
aims at modelling both markedly positive and negative jumps, one may add a second jump process J_ (t) 
defined as: 

dJ-(t) = Z M{t) dM{t) 

where now M is an arrival Poisson process with intensity A_ counting the jumps, independent of J, and 
Z are again iid normal random variables with mean iiz and variance <j 2 z , independent of M, J, W . The 
overall process reads: 

dx(t) = a {6- x{t)) dt + adW(t) + dJ(t) - dJ_ (t) or (86) 
dx{t) = a(6j - x(t))dt + adW(t) + [dJ(t) - \/j, Y db]- [dJ_(i) - \-(jl z dt], 

a 

This model would feature as transition density, over small intervals [s,t]: 

fx(t)\x(t-At)(x) = A At f N (x;m x (At,x(t - At)) + ^y,v x (At) + a Y ) (87) 
+A_ At f N (x; m x (At, x(t - At)) - pL Z , v x (At) + a%) 
+ (1 - (A + A_)At) f N (x; m x (At, x(t - At)),v x (At)) 

This is now a mixture of three Gaussian distributions, and allows in principle for more flexibility in 
the shapes the final distribution can assume. 
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Figure 22: The tail of the EMU Corporate A rated 5-7Y index histogram (in blue) compared to the fitted 
Vasicek process probability density (in green) and to the fitted Vasicek with jumps process probability 
density (in red) 



Again, one can build on the Vasicek model with jumps by squaring or exponentiating it. Indeed, even 
with positive jumps the basic model still features positive probability for negative values. One can solve 
this problem by exponentiating the model: define a stochastic process y following the same dynamics as 
x in (86) or (76), but then define your basic process x as x(t) = exp(y(t)). Then historical estimation 
can be applied working on the log-series, by estimating y on the log-series rather than on the series itself. 

From the examples in Figures 21 and 22 one sees the goodness of fit of Vasicek with jumps, and note 
that the Vasicek model with positive-mean Gaussian size jumps can be an alternative to the exponential 
Vasicek. Indeed, Figure 20 shows that the tail of exponential Vasicek almost exceeds the series, whereas 
Vasicek with jumps can be a weaker tail alternative to exponential Vasicek, somehow similar to CIR, as 
one sees from Figure 22. However, by playing with the jump parameters or better by exponentiating 
Vasicek with jumps one can easily obtain a process with tails that are fatter than the exponential Vasicek 
when needed. While, given Figure 20, it may not be necessary to do so in this example, in general the 
need may occur in different applications. 

More generally, given Vasicek with positive and in case negative jumps, by playing with the jump 
parameters and in case by squaring or exponentiating, one can obtain a large variety of tail behaviours 
for the simulated process. This may be sufficient for several applications. Furthermore, using a more 
powerful numerical apparatus, especially inverse Fourier transform methods, one may generalise the jump 
distribution beyond the Gaussian case and attain an even larger flexibility. 
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Notation 



General 

E Expectation 

Var Variance 

SDE Stochastic Differential Equation 

P(A) Probability of event A 

Mx (u) Moment generating function of the random variable X computed at u 

4>x(u) Characteristic function of X computed at u 

q Quantiles 

fx (x) Density of the random variable X computed at x 

Fx (x) Cumulative distribution function of the random variable X computed at x 

fx\Y=y{x) 

or fx\y{x; y): Density of the random variable X computed at x 

conditional on the random variable Y being y 
N(m, V) Normal (Gaussian) random variable with mean m and Variance V 

/at(x; to, V) Normal density with mean to and Variance V computed at x 

= ( 27r y)-l ex p(-^) 

$ Standard Normal Cumulative Distribution Function 

fp(x; L) Poisson probability mass at x = 0, 1, 2, . . . with parameter L 

= exp(-L) 

fr(x; k, C) Gamma probability density at x > with parameters k and (: 

= x*- 1 cx P (-x/c)/(C K r(K)) 

~ Distributed as 

Statistical Estimation 

CI, ACF, PACF Confidence Interval, Autocorrelation Function, Partial ACF 
MLE Maximum Likelihood Estimation 

OLS Ordinary Least Squares 

Parameters for MLE 

£(9)0i, ...,x n ):= fx u x 2 ,...,x n \e(xi, ■ ■ ■ ,x n ) or f Xl ,x 2 ,...,x n (xi, x n ; 6): 

likelihood distribution function with parameter in the assumed distribution, 
for random variables X\, . . . , X n , computed at x\, . . . , x n 

C*(Q)(xi, ...,x n ) :=log(£(0)(xi, . . . ,x„)), logdikelihood. 

Estimator for 

Geometric Brownian Motion (GBM) 

ABM Arithmetic Brownian Motion 

GBM Geometric Brownian Motion 

fi, (i Drift parameter and Sample drift estimator of GBM 

<7, <7 Volatility and sample volatility 

Wt or W(t) Standard Wiener process 

Fat Tails: GBM with Jumps, Garch Models 

Jt, J(t) Jump process at time t 

N t , N(t) Poisson process counting the number of jumps, 

jumping by one whenever a new jump occurs, with intensity A 
Yi, Y2, . . . , Yj, ... Jump amplitudes random variables 



Time, discretisation, Maturities, Time Steps 

At Time step in discretization 
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s, t, u Three time instants, < s < t < u. 

= to,ti, . . . ,U, . . . A sequence of equally spaced time instants, ti — tj_i = At. 

T Maturity 

n Sample size 

Variance Gamma Models 

VG Variance Gamma 

pi Log-return drift in calendar-time 

g(t) <~ r(t/^, v) Transform of calendar time t into market activity time 
9 Log-return drift in markct-activity-timc 

(7 volatility 

Vasicek and CIR Mean Reverting Models 

9, a, a Long-term mean reversion level, speed, and volatility 



